Topic Modeling in Financial Documents
نویسنده
چکیده
This paper describes the application of topic modeling techniques to quarterly earnings call transcripts of publicly traded companies. Earnings call transcripts represent an interesting case for analysis because the document is relatively unstructured and potentially more informative than 10K and 10Q disclosures due to the question and answer session consisting of unprepared statements. This paper addresses the clustering of these documents as well as the segmentation of individual documents into clusters for products and industries the company is active in. The goal is for each transcript to be assigned to some number of topics, and the specific segments of the transcript which address a given topic to be specified as well. Thus, not only will the documents be classified as covering some set of topics, but the documents themselves will be partitioned into different sub-topics. I will discuss progress I made in achieving these goals as well as challenges and issues which remain. This work could prove useful in financial document summarization as well as improving search and display of documents and information relevant to a user’s search and interests. Furtheremore, applying NLP and machine learning concepts to financial document analysis is increasingly being used by trading firms and hedge funds to gain competitive advantage.
منابع مشابه
A review of text mining approaches and their function in discovering and extracting a topic
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling. Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملInteractive Visual Exploration of Topic Models using Graphs
Probabilistic topic modeling is a popular and powerful family of tools for uncovering thematic structure in large sets of unstructured text documents. While much attention has been directed towards the modeling algorithms and their various extensions, comparatively few studies have concerned how to present or visualize topic models in meaningful ways. In this paper, we present a novel design th...
متن کاملModeling Tag Dependencies in Tagged Documents
We present a general approach for modeling tagged documents with topic models. This approach extends related topic models by exploiting the dependencies between tags. We show how this model improves performance in a prediction task where the goal is to predict missing tags for new documents. Predictions also compare favorably with SVMs.
متن کاملMarkov Random Topic Fields
Most approaches to topic modeling assume an independence between documents that is frequently violated. We present an topic model that makes use of one or more user-specified graphs describing relationships between documents. These graph are encoded in the form of a Markov random field over topics and serve to encourage related documents to have similar topic structures. Experiments on show upw...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010